The influence of Chunking on Dependency Crossing and Distance
نویسندگان
چکیده
This paper hypothesizes that chunking plays important role in reducing dependency distance and dependency crossings. Computer simulations, when compared with natural languages, show that chunking reduces mean dependency distance (MDD) of a linear sequence of nodes (constrained by continuity or projectivity) to that of natural languages. More interestingly, chunking alone brings about less dependency crossings as well, though having failed to reduce them, to such rarity as found in human languages. These results suggest that chunking may play a vital role in the minimization of dependency distance, and a somewhat contributing role in the rarity of dependency crossing. In addition, the results point to a possibility that the rarity of dependency crossings is not a mere side-effect of minimization of dependency distance, but a linguistic phenomenon with its own motivations. Introduction. – Language used in communication is invariably presented linearly, one unit after another, which is regarded as one of its fundamental property [1]. However, there is always a sytactic tree structure underlying a onedimensional linear sentence, a structure underpinning both the production and the comprehension of this sentence [2,3]. Therefore, language processing consists, to a considerable degree, in the transformation between the syntactic tree structure and the one-dimensional linear arrangement. What properties can be found in the tree structure of language? What mechanisms constrain the transformation of tree structure into linear structure? The answers to these questions, which may well require researches based on statistical physics and computer simulation, probably will shed much light on how human language operates. In terms of dependency grammar, the structure of a sentence can be visualized as a hierarchical dependency tree, whose nodes (vertices) are words, linked to one another by directed edges (dependency relations) [2,3]. Such a hierarchical tree must be ultimately arranged into a linear sequence, for the purpose of spoken and written communication. So far, researches have repeatedly observed two phenomena in the linear realization of hierarchical dependency structure: the minimization of dependency distance (the number of intervening words) between two syntactically related words [4-13], and the rarity of crossing dependency relations [14,15]. Liu [5] has compared dependency distance of 20 natural languages with that of two different random languages, and pointed out that dependency distance minimization seems to be universal in human languages. Ferrer-i-Cancho has theoretically analyzed these [8,9]. A recent study based on 37 languages has obtained similar findings[11]. Since dependency distance is held as cognitively related to language processing load [16], the minimization of dependency distance is probably a result of the principle of least effort [17]. In addition, it is argued that that the rarity of crossing dependencies is simply a by-product of the pressure to minimize dependency distance and cognitive cost in language processing, having little to do with the syntax of the language [7-10]. Similarly, some studies find that dependency distance will significantly increase if dependency crossings are permitted, and suggests that reducing dependency crossings is probably an important means to restrain dependency distance [4,5]. Dependency distance and crossings are closely related, and in human languages both seem to be subject to minimization. Ferrer-i-Cancho [9,10] has theoretically proven that, for sufficiently short dependency lengths, the probability that two edges cross decreases as their length decreases. However, Liu has found that projective random language (i.e. without any crossing dependency) has significantly longer mean dependency distance than natural langauage [4,5]. Therefore,
منابع مشابه
Numerical study of heat gradient and crossing energy to around building walls containing phase change materials in Kashan temperature conditions
The application of phase change materials (PCM) in the variant parts of building because of the high capacity these materials, lead to an improvement in temperature conditions and reduction in energy consumption. Due to the high dependency of the performance of these materials to the ambient temperature fluctuations, their applications in climates with extreme temperature fluctuations has a sig...
متن کاملStructure Alignment Using Bilingual Chunking
A new statistical method called “bilingual chunking” for structure alignment is proposed. Different with the existing approaches which align hierarchical structures like sub-trees, our method conducts alignment on chunks. The alignment is finished through a simultaneous bilingual chunking algorithm. Using the constrains of chunk correspondence between source language (SL)1 and target language (...
متن کاملP45: The Effects of Nigella sativa on Sickness Behavior Induced by Lipopolysaccharide in Male Wistar Rats
Neuroimmune factors contribute on the pathogenesis of sickness behaviors. Nigella sativa (NS) has anti-inflammatory, anti-anxiety and anti-depressive effects. In the present study, the effect of NS hydro-alcoholic extract on sickness behavior induced by lipopolysaccharide (LPS) was investigated. The rats were divided into five groups (n=10 in each): (1) control (saline), (2) LPS (1 mg/kg, admin...
متن کاملLearning Dependency Relations of Japanese Compound Functional Expressions
This paper proposes an approach of processing Japanese compound functional expressions by identifying them and analyzing their dependency relations through a machine learning technique. First, we formalize the task of identifying Japanese compound functional expressions in a text as a machine learning based chunking problem. Next, against the results of identifying compound functional expressio...
متن کاملA Unified Single Scan Algorithm for Japanese Base Phrase Chunking and Dependency Parsing
We describe an algorithm for Japanese analysis that does both base phrase chunking and dependency parsing simultaneously in linear-time with a single scan of a sentence. In this paper, we show a pseudo code of the algorithm and evaluate its performance empirically on the Kyoto University Corpus. Experimental results show that the proposed algorithm with the voted perceptron yields reasonably go...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1509.01310 شماره
صفحات -
تاریخ انتشار 2015